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Mean-Field Games with Explicit Interactions^ 

Josu DonceP’^’, Nicolas Gast*^’^, Bruno GaujaP’*’ 

“ Univ. Grenoble Alpes, F-38000 Grenoble, France 
^Inria 


Abstract 

We introduce the mean-field games with explicit interactions. This model is a finite- 
state space mean-field game where the evolution and the cost function of the indi¬ 
vidual players depend not only on the actions taken, but also on the population 
distribution. We analyze these games in continuous and discrete time, over hnite 
as well as infinite time horizons. We show the existence of a mean-field equilibrium 
in this type of games using an adapted version of Kakutani fixed point theorem. 
Besides, we also study the convergence of the equilibria of A-player games to mean- 
field equilibria. We define classes of strategies over which any equilibrium converges 
to a mean-field equilibrium when the number of players goes to infinity. We also 
exhibit equilibria outside this class that do not converge to mean-field equilibria. In 
discrete time the same non-convergence phenomenon implies that the Folk theorem 
does not scale to the mean field limit. Finally, we construct a mean-field game with 
explicit interaction to study vaccination strategies over an SIR infection propaga¬ 
tion model and compute its mean field equilibrium is almost closed form. We also 
compare the Nash equilibrium with a centrally optimal strategy and show that, in 
all but degenerated cases, the equilibrium does not coincide with the optimal solu¬ 
tion. We design a pricing mechanism that force the equilibrium to coincide with an 
optimal vaccination strategy. 


1. Introduction 

The history of game theory has been marked by milestone theorems concerning 
the existence of an equilibrium: The minimax theorem of Von Neumann [29], that 
shows the existence of an equilibrium for finite zero sum games; the famous result 
of Nash [23], showing the existence of an equilibrium in finite games; the folk the¬ 
orem (see for example 0) that characterizes Nash equilibria in repeated games or 
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the mean-field game solution, introduced by Lasry and Lions |23j . that shows the 
existence of equilibria in dynamic games with an infinite number of players. 

In this paper, we introduce a variant of mean-field games and prove the existence 
of an equilibrium. We also show how this equilibrium is essentially different from 
the equilibria obtained when the number of players is finite. 

In general, mean-field games study the strategic decision making in the presence 
of a huge number of rational indistinguishable players. An important assumption in 
this type of games is that no individual player action can affect the dynamics of the 
system (a.k.a. the population) and conversely, each player reacts to the distribution 
of the population over all player’s states and not on the state of each individual 
player. The mean-field games considered here differ from previous definitions in one 
crucial point: the state dynamics of a player as well as her strategy depends on 
the aggregate state of the other players. This is called explicit interaction in the 
following. Other (milder) differences with the model of Lasry and Lions include the 
fact that we consider finite state spaces over a finite and/or infinite time horizon 
and the fact that players may play simultaneously and/or asynchronously. We claim 
that this model with explicit interactions covers several natural phenomena such as 
information/infection propagation or resource congestion where the cost but also the 
state dynamics of a player depend on the state of the others. This type of behavior 
is classical in systems with a large number of interacting objects [2] and cannot to 
handled using previous mean-held game models. In the infection model, the rate of 
infection of one individual depends on the proportion of individual already infected. 
In the congestion case, one player can barely use a resource if it is already heavily 
loaded. 

Mean-held games with hnite state spaces have been studied in [TO] for the discrete 
time case and in mm for example, in the continuous time case. Up to our 
knowledge, in all these cases, the dynamics of one player does not depend on the 
state of the other players. Also, the existence of a Nash equilibrium in these papers 
is proved for strictly convex costs. This is a strong assumption because it does not 
cover the case of average costs. 

Here, we consider general mean-held games with hnite state spaces and with 
explicit dependence on the population distribution. We show that such games admit 
a Nash equilibrium under mild continuity assumptions by applying the Kakutani 
hxed point theorem for inhnite dimension spaces. The main idea of the proof is not 
to consider the best response operator directly but some sort of aggregate operator 
instead, that satishes all the conditions needed to apply the hxed point theorem. 

We prove this existence of a mean-held equilibrium in several case: over a h- 
nite or an inhnite time horizon (with discounted costs) and under discrete time or 
continuous time dynamics. 
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In all four combinations, we prove that a mean-field equilibrium is an e-approximation 
of an equilibrium of a corresponding game with a finite number N of players, where 
e goes to 0 when N goes to infinity. However, conversely, not all equilibria for the 
finite version converge to a Nash equilibria of the mean-field limit of the game. We 
provide several counter-examples to illustrate this fact. They are all based on the 
following idea: The “tit for tat” principle allows one to define many equilibria in 
repeated games with N players. However, when the number of players is infinite, the 
deviation of a single player is not visible by the population that cannot punish him 
in retaliation for her deviation. This implies that while the games with N players 
may have many equilibria, as stated by the folk theorem, this may not be the case 
for the limit game. This fact seems to have been overlooked so far. 

The model we analyze in this article is very general and we believe that a large 
variety of models fit in our framework. As an illustrative typical example, we present 
a vaccination strategy in an infection propagation model that we analyze thoroughly 
in Section This is a typical example of a mean filed game with explicit interaction 
because the players become infected according to a mean-field dynamic: the rate of 
infection of one player at time t is a function of the proportion of infected players, 

m. 

The rest of the article is organized as follows. 

Section introduces mean-field games with explicit interactions in continuous 
time case with infinite horizon and discounted costs as well as finite horizon. We 
describe the evolution of the state of the players, the cost function as well as the best 
response operator. In both cases (finite and infinite horizon), we prove the existence 
of an equilibrium. We also show in Section that this equilibrium is an approxi¬ 
mation of an equilibrium for the game with a hnite number of players. Finally, we 
study an example of a A^-player game inspired from the prisoner’s dilemma whose 
equilibria are not for the limit mean-field game. 

Section considers the discrete time case. In this case, Wplayer games can be 
seen as classical repeated games. The mean-field limit dynamics is derived in discrete 
time and the existence of an equilibrium is proved similarly as in the continuous time 
case. Here counter-examples of equilibria for finite games that do not go to the limit 
are even more rampant. Indeed, the folk theorem applies and all equilibria based 
on retaliation cannot be equilibria at the limit. 

Finally Section is dedicated to the study of one mean-held game, namely a 
vaccination game against an infection propagation. In this SIRV model, the players 
can take 4 states: Susceptible, Infected, Recovered or Vaccinated. We consider that 
each player uses a a vaccination strategy whenever it is susceptible. The cost includes 
a vaccination cost as well as a unit time cost of being infected. The strategy of a 
generic object is to choose her vaccination strategy so as to minimize her expected 
cost over time. We show that there exists a mean-held equilibrium that is pure 
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(deterministic) and of threshold type. More precisely, the mean-field equilibrium 
consists in vaccinating with maximum rate from time 0 up to some critical time after 
which the player does not vaccinate any longer. We also formulate the centralized 
optimization problem, where the goal is to find the population vaccination strategy 
so that the total cost is minimized. We show that the solution of a centralized 
problem (or global optimum) is also of threshold type. We observe numerically that, 
except for trivial cases, the mean-field equilibrium does not coincide with a global 
optimum. We also present a pricing mechanism incentivating individual objects to 
follow the global optimum strategy. We have developed simulations that provide 
numerical evidence that the vaccination cost must be subsidized to encourage selfish 
individuals to vaccinate optimally. 


2. Continuous Time 

2.1. Notations and Definitions 

We consider a population made of an infinite number of homogeneous players 
that evolve in continuous time. Each player has a finite state space denoted by 
5 = {1,..., 5} and a finite action set M = {1,..., A}. 

A mixed strategy (or strategy for short) is a measurable function tt : S x M+ —)■ 
V (M), that associates to each state i G S and each time f > 0 a probability measure 
7ri{t) on the set of possible actions, where V{A) is the set of probability measures 
over A (as A is finite, 'P(M) is the simplex). We also denote by TTi^a{t) the probability 
that, at time t, a player in state i takes the action a, under strategy tt. For all f > 0 
and all i G S, we have ~ ^ strategy is deterministic 

(or pure) if, for all state i and all t G M, there exists an action a G A such that 
= 1 and TTi^a'it) = 0 for all a' a. 

We denote by m^(t) G V{S) the population distribution at time t. As the state 
space is finite, m^(f) is a vector whose ith component, mj{t), is the proportion of 
players in state i at time t. We assume that the initial condition m^(0) = mo is 
fixed. For t > 0, the population distribution m^ is the solution of the following 
differential equation, that depends on the strategy tt; for j G S 

W = Z] Z] mJ{t)Qija{mA{t))T^i,a{i)- (1) 

ieS a&A 

The rationale behind this differential equation is that when the action a G M is 
taken, the players in state i move to state j with rate Qija{mA{t)). 

Remark 1 (Explicit interactions). In this model, the rate matrix Qija{m^{t)) de¬ 
pends explicitly on the population distribution: the rate to go from state i to state j 
under action a depends on how the whole population is distributed among the states 
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of the system. Other mean-field models, such as m, only consider the special case 
where Qija{m^{t)) is constant: Qija{nO{t)) = Qija- This restricts the population 
dynamics given in Q to linear dynamics. 

We now concentrate on a particular player, that we call player 0. Player 0 chooses 
her own strategy 7 r° : M x 5 —)• V{A). We denote by (t) eV{S) the probability 
distribution of Player 0 when player 0 uses strategy against a population who 
plays strategy it. The dependence of the state of player 0 on vr is kept implicit to 
avoid heavy notations. For a given state i £ S, xj (t) denotes the probability for 
Player 0 to be in state i at time t. The distribution evolves over time according 
to the following differential equation: for j £ S 

T(*) = E E •<”(*) • Q.>(m'(*)) • (2) 

iScS a&A 

If Player 0 is in state i and takes an action a, it suffers from an instantaneous 
cost that depends on the population distribution at time t. Given a 

population strategy vr and the strategy of Player 0 vr*^, we define the discounted cost 
of Player 0 as 

V{TT^,7r)= f (t) •Ci,a(m^(t)) •7r°„(t)dt, (3) 

■^0 \iGS aGA J 

where /3 > 0 is the discount factor. 

Example 1 (Epidemic model). In Section^ we analyze an epidemic model. The 
state space of the players is S = { susceptible (S), infected (I), recovered (R) or 
vaccinated (V)}. The propagation of the infection follows a uniform contact process: 
An arbitrary player gets in contact with the other players according to a Poisson 
process, with rate 7 . If the other player is infected, then infection is transmitted to 
the player. As for recovery, each player recovers with rate 7 . A player can only take 
actions when she is in the susceptible state. It strategy consists in choosing (with 
rate r), a vaccination probability at time t: Tr{t) = Pv{t), and the set of pure actions 
is reduced to two points {v,^v,v} (vaccinate or not). In that case, the dynamics of 
the population distribution is driven by Q with 

= Qsj^y{rrP{t)) = 'ym]{t), 

= r, 

Qs,v,^v = 0 , 

Qi^R^y{nP{t)) = QpR^^y{llP{t)) = p. 

All the other entries of Q are null. Note that Q depends explicitly on the population 
distribution: the rate of infection depends on the proportion of infected players. 
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Player 0 is an independent player that chooses her own vaccination strategy 
vr^. The strategy vr^ does not modify In this model, the cost a player 0 is 

decomposed into a fixed vaccination cost plus an infection cost for the duration of 
her infection. 

The goal of Player 0 is to choose a strategy that minimizes her discounted 
cost Q when the rest of the population plays a strategy vr. For a given population 
strategy vr, we denote the best-responses of Player 0 to vr by This set is the 

set of strategies that minimizes her discounted cost: 

BR{-k) = argminP(7r°,7r). 

7J-0 

We then define a mean-field equilibrium as a strategy such that when the 

population strategy is vr^^^, a selfish player 0 would also choose the same strategy 
t^mfe. 

Definition 1 (Mean-Field Equilibrium). A strategy vr is called a mean-field equilib¬ 
rium if it is a fixed point for the best-response function, i.e., 

ttMFE g bR{tt^^^). (4) 

A mean-field equilibrium is said to be pure if it is a deterministic strategy. 

The rationale behind this definition is when one considers that the population is 
formed by players that each take selfish decisions. As the population is homogeneous, 
each player best-response is the same as that of Player 0. In other words, for a given 
population strategy vr, all the rational players of the populations (or players) choose 
the strategy BR{tt). A mean-field equilibrium is a situation where no player has 
incentive to deviate unilaterally from her strategy. In the case of the epidemic model 
of Section a vaccination strategy is a mean-field equilibrium if it coincides 

with the vaccination chosen by a selfish Player 0. 

2.2. Existence of Mean-Field Equilibrium 

We now show that, under very general assumptions, these games have a mean- 
field equilibrium. As for classical games, these equilibria are not necessarily pure. 
In the epidemic model that we study in Section we show that there exists a pure 
mean-field equilibrium. Our proof relies on a generalization of Kakutani fixed point 
theorem to infinite dimensional spaces. The main difficulty of the proof is that the 
best-response function BR(7r) is not a Kakutani map because we did not assume 
the cost function to be strictly convex (contrary to |12] for example). 

Our technical condition is essentially to ensure that the differential equations Q , 
Q and the cost equation ^ are well defined, which is guaranteed by Assump¬ 
tion (Al): 
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(Al) The function m i—)■ Qija{m) is Lipschitz-continuous in m. The function m i—)> 
Ci,a(m) is continuous in m. 

In particular, this assumption implies that the costs and the rates are both bounded 
by a finite value. 

Theorem 1. Any continuous time mean-field game with discounted cost that satis¬ 
fies Assumption (Al) has a mean-field equilibrium. 


Proof. The best-response function tt i—)• BR{tt) is neither continuous nor hemi- 
continuous in general. As a result, we formulate the fixed point problem in an 
alternative manner by considering a fixed point in m. 

By Assumption (Al), the function Qija{m) is continuous in m. As m G 7^(5) 
leaves in a compact space, this shows that Qija{m) is bounded by some con¬ 
stant L. Let Ai be the set of continuous functions from M"*" to 'P(Al) that are 
Lipschitz-continuous with constant LS. As a result, Ai includes the set of solu¬ 
tions of the differential equation Q. We equip Ai with the norm ||m —m'|| = 
t>o By the Arzela-Ascoli theorem, Ad is a compact 

space. 

For a given population distribution m G Ad, we say that a strategy is feasible 
for (x, m) if x satisfies ([^. We then define 


Y (x, m) = 


inf 

rO feasible for (x,m). 


'^Xi{t)TTl^{t)ci^a{ni{t))e ^^\dt, (5) 


I,a 


with the convention that Y (x, m) = -|-oo if there exists no strategy feasible for 
(x, m). This definition is also valid for the functions m G Ad that do not satisfies 
0 for any strategy vr. 

We now define 0 : Ad —)• Ad such that, for all m G Ad, 


(/>(m) = argminy(x, m) /g') 

Note that for all x G </>(m), there exists a strategy such that is feasible for (x, m). 

The quantity (j){xn) is the best-response of Player 0 to the population m. Hence, 
if we show that (/>(•) has fixed point m G <^(m), there exists a strategy such that 
TT^ is feasible for (m, m). This implies that m satisfies Q from which we conclude 
that there exists a mean-field equilibrium for the mean-field games with explicit 
interactions. 

We first show that the function <f{-) satisfies the following properties. 
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Lemma 1. (a) For all m E Ai, (j){m) is convex, 


(b) The graph o/ m i—)> i?!)(m) is closed, that is: for any sequences m„ E M. 
and x„ E </>(m„) such that lim^^oo = m and lim^^oo Xn = x, we have 
X E (/>(m). 

Proof, (a) Let x be a trajectory. Replacing the quantity Xi(t) 7 r°^(t) by Zi^a{t) in 
(§, Y (x, m) can be written as 


y(x, m) =min / I > 

^ ^0 vt^ 


Zi^a{t)ci^a{m{t))e I dt 


' =Xj(t)Vj E5, 


where z satisfies 


Zj,a{t) > 0,Vj E S'ia E A, 

Xjit) = E 5. 


(7) 

( 8 ) 


Let xi E (/)(m) and X 2 E (/>(m) and let X 3 = axi + (1 — a!)x 2 some some 
a E (0,1). We show that X 3 E There is zi and Z 2 that minimize Q for 

xi and X 2 and satishes the above constraints. As the constraints are linear, it 
follows immediately that Z 3 = azi + (1 — q:)z 2 satisfies the same constraints 
and also minimizes 0 . 

(b) Since x„ E (/)(m„), there exists a strategy vr^ such that x^ verihes ([^ and 
minimizes y(x„,m„). We note that Tin E L^(M), which is a reflexive space. 
Therefore, we know that the sequence 7 r„ has a subsequence that converges 
weakly to tt. Using the continuity assumptions (Al), then x E (^(m). 

□ 


Lemma shows that f has convex and compact values. Moreover, the function 
(/>(•) is upper-semicontinuous since it satishes the property of Lemma [^ 6 ) and Ai 
is compact [U Prop. 11.11]. Hence, (j){-) is satishes the conditions of |13l Theorem 
8 . 6 .] and therefore it has a hxed point. □ 


2.3. Finite Florizon Case 

The above formulation generalizes directly to a hnite-horizon setting where each 
player tries to minimize her cost over a hnite horizon T. As in the discounted case, 
the evolution over time of the population distribution m’^ is given by a and the 
evolution of Player O’s distribution is given by Q. 



Given the population strategy vr and Player 0 strategy vr^, the expect cost of 
Player 0 for the finite horizon case is defined as follows: 



(9) 


In the literature, there are similar models considering continuous time finite state 
space mean-field games with finite horizon. For example, the authors in im consider 
uniformly convex cost functions and in |14] strictly convex cost functions. In our 
model, we assume that the costs are continuous but, as it can be observed, the 
instantaneous cost of Player 0 is linear in vr*^. Therefore, the model we study in this 
work is not covered by these models. 

We define the notion of mean-field equilibrium for the hnite horizon case as in the 
discounted case, by replacing the cost function Q by Q. Then, the proof of The¬ 
orem applies mutatis mutandis to show the existence of a mean-field equilibrium 
in this case. 

Theorem 2. Any continuous time mean-field game with finite horizon cost that 
satisfies Assumption (Al) has a mean-field equilibrium. 

3. Convergence of Finite Games to Mean-field Game 

In this section, we will show that mean-field equilibria are the limits of a sub-class 
of Nash-equilibra when the number of players goes to inhnity. 

3.1. Mean-field Markov Game with N Interacting players 

Our model is motivated by a game with N players that evolve in continuous 
time. Each player becomes active according to a Poisson process, independently of 
the others. When a player becomes active, it chooses randomly among the others a 
hnite number of players. Each of these players then takes an action that influence 
the payoff of the various players and the evolution of their internal state. 

When observing the system only when the players interact, this model can 
be viewed as a discrete-time model. Players can only take decisions to maxi¬ 
mize their expected payoff over an inhnite time-horizon at discrete times in Tn = 
{0,1/N,2/N,...}. Each player has an internal state that evolve over time. The 
state of player n at time t is denoted by Xn (t). The set of possible states of a player 
is hnite and is denoted by S. The global description of the system at time t is 


X{t) = {Xfit)...XN{t))GS^. 


We consider a mean-held interaction model, which means that the behavior 
of one object only depends on the states of the other objects only through the 
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proportion of objects that are in a given state. To be more precise, we denote by 
M(t) gV{S) the population distribution of the system at time t, where V{S) is the 
state of probability measures on S. As the set S is finite, M(t) is a vector with |5| 
components and for all s G 5, Mi[t) is the fraction of players that have state s at 
time t: 

n=l 

At each time step t £ Tn, a finite revision set R{t) of players is defined, according to 
an IID process over all the sets of players. Only the players in R{t) can revise their 
action at time t. This action is fixed until the next time this player is chosen again. 
We assume that for all i ^ j £ S, there exists a function Qij : Ax V{S) ^ M"'' that 
is Lipschitz-continuous in m and such that, under Rt, the natural filtration of the 
process, 


^n{t + J^) — j 


Xn{t) = i,n£ R{t), M{t) = m, A„(t) = a, Tj ) = 

1 


nm)\] 


Qij{a,m), (10) 


where R is the natural filtration of the process. 

At time t £ Tn, the player n suffers an instantaneous cost that is a function 
of her state Xn{t), the action that she takes An{t) and the population distribution 
M(t). We write this instantaneous cost cx„(t),yi„(t)(M(t)). The objective of player 
n is to choose a strategy tt"' from some set of admissible strategies IT, in order to 
minimize her expected discounted payoff, knowing the strategies of the others. As 
before, the discount factor is denoted /?. Given a strategy tt"' G 11 used by player n 
and a strategy tt G II used by all the others, we denote by y(7r"^,7r) the expected 
discounted payoff of player n: 


V^{Tr^,Tr) =E 


t^TN 


An is chosen w.r.t. tt'^ 

An' is chosen w.r.t. tt (n' / n) 


An equilibrium for this game is a strategy tt such that a player has no another 
admissible strategy that leads to a higher payoff. This notion depends naturally on 
the set of admissible strategies. 

Definition 2 (Equilibrium of the N player game). For a given set of strategies II, 
a strategy tt G II is called a symmetric li - equilibrium for the N player game if, for 
any strategy tt"' G II.- 

F^(7r,7r) < E^(7r"',7r). 


We will also use the notion of e-equilibrium: 
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Definition 3 (e-equilibrium of the N player game). For a given set of strategies 11, 
a strategy tt G 11 is called a symmetric Yi-equilibrium for the N player game if, for 
any strategy vr” G II; 


tt) < tt) -|- e. 


3.2. Possible set of admissible strategies 

In a full information setting, An{t) is a (possibly random) functions of the values 
Xn'{t') up to time t' < t and all actions taken in the past A^ft'), for t' < t and 
for n' G Such a strategy is, however, hard to describe. Therefore, in 

the following, we will consider three possible restrictions for the set of admissible 
strategies: 


(Markov) - A strategy vr is called a Markov strategy if it induces a choice of 
An{t) that is a (possibly random) function of only t, M(t) and X(t): 

FiAn{t) = a\Ft)= 7ra,Xr,{t)(M{t))- 


This definition is motivated by the fact that, as indicated by Equation (10), 


the behavior of one object depends on the others only through the value M(t). 
This implies that when all the other players use a Markov strategy, the set of 
Markov strategy is dominant among the set of full-information strategy: there 
exists a full-information best-response for player n that is a Markov strategy. 

(Local) - We call a strategy vr a local strategy if the choice of the action only 
depends on the player’s internal state and on the time. 


P(An(t) = a\Ft) = 

Note that we allow this strategy to depend on time because M(t) evolve with 
time. 


3.3. Technical assumptions and limiting regimes 

In addition to Assumption 1 that ensures the continuity of the functions Qij{a, m) 
and Ci^a{fn) in nr, we will also assume Assumption 2: 

(A2) the number of players that are selected at each point in Tn has a bounded 
second moment, he., that there exists B < oo such that for all t G T: 
E[\R{t)\^]<B. 

The next theorem provides an equivalence between local equilibria and mean- 
field equilibria. In particular, it shows that mean-field equilibria are a good approx¬ 
imation of local equilibria. However, as we will show later, this result does not hold 
for Markovian equilibrium. 
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Theorem 3. Assume that (Al) and (A2) hold. Then: 

(i) Let TT he a mean-field equilibrium. There exists Nq such that for all N > Nq, 
TV is a local e-equilibrium of the N player game. 

(ii) If (vr^) N is a sequence of local strategies such that is a local-equilibrium 
for the N player game, then, any sub-sequence of the sequence (7r^)iv has a 
sub-sequence that converges weakly to a mean-field equilibrium. 

Proof. The main technical difficulty of the proof is to show that for any local strate¬ 
gies tt”, tt, tt) converges to V (vr"', tt) uniformly in tt, tt'. Uniform convergence 

follows from Theorem 3.3.2 in [28]. Indeed, local strategies as defined here are equiv¬ 
alent to stationary strategies, as defined in [28] . 

Thus, for any e, there exists Nq such that N > Nq implies that 
|U^(7r"',7r) — U(7r"',7r)| < e/2. Hence, if tt is a mean-field equilibrium, this implies 
that for any local strategy tt"': 

U^(7r, vr) < U(7r, vr) I < U (tt"-, tt) -b | < tt) -\- e. 

This shows (i). 

For (ii), if is a sequence of local strategies, then any sub-sequence has a sub¬ 
sequence that converge weakly to some local strategy 7r°°. As V (vr"', vr) is continuous 
in vr” and vr (for the weak topology), this implies that U(7r°°) < U(7r”,7r) for all 
local strategy tt"’. □ 


3 . 4 . Markov equilibria do not converge to MFE 

We now show that this theorem does not generalize to Markov strategies. The 
main ingredient used to construct the following counterexample, is the “tit-for-tat” 
principle. This idea can be used to define equilibria for any W-player game but 
cannot be used in mean-field games. In mean-field games, punishment is possi¬ 
ble against a fraction on the population that deviates but is not possible against 
individual deviation, because it is not seen in the population distribution. 

We consider a mean-field version of the classical prisoner’s dilemma. The state 
space of a player is 5 = {C, D) (that stand for Cooperate and Defect) and the 
action set is the same ^ = 5. At each time step, one player is chosen. If she selects 
an action a G M, her state becomes a at the next time step. 

The instantaneous cost of a player n depends on her state i and on the mean-field 
m: 

, . 1 rnc + if f = C 

’ ^ ^ 1 2mD -iii = D 
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This cost function corresponds to at each time step, a player plays her strategy 
against a randomly assigned opponent and suffers a cost that corresponds to the 
following matrix: 



C 

D 

c 

1,1 

3,0 

D 

0,3 

2,2 


The strategy D dominates the strategy C. This implies that playing D is the 
unique mean-field equilibrium. Indeed, by using that xc{t) + xnit) = mc{t) + 
moit) = 1, the expected cost (given by Q) of a Player 0 that has a state vector x 
while the mean-field is m{t) is 

[xc{t)mc{t) + ?>xc{t)mD{t) + 2 xD{t)mD{t)]e~^^ dt 

[xcit] -I- 2mD{t)]e~^^dt. 

It should be clear that this cost is minimized when xc is minimal, which occurs 
when the strategy is to choose action D regardless of the current state. This shows 
that the only mean-field equilibrium is when all players choose action D. 

Let us now consider the game with N players and consider the following Markov 
strategy: 

C if TTlc = 1 
D if rUc < 1 

and let us show that for (5 <1 and N large, vr is a Markov equilibrium. 

Assume that all players, except player n, play the strategy vr and let us compute 
the best response of player n. It should be clear that if at time 0, me < 1, then the 
best response of player n is to play D. On the other hand, if me = 1 and if player 
n is picked, then: 

• If player n applies tt, she will suffer a cost ^ « f exp(—^t)dt = 

1//3. 

• If player n deviates from vr at this time step and chooses the action D, all 
players will also deviate after the next time step. This implies that muit) = 
1 — exp(—t) and that the player n will suffer a cost approximately equal to 
f^(xe(t) + 2 — 2e~*}e~^*dt > 2/(/3(/3 -|- 1)) when N is large. 

This shows that when /3 < 1, player n has no incentive to deviate from the strategy 
TT and that therefore, vr is a Nash equilibrium. 
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3.4-1- Finite-time horizon 

In the finite-horizon case, the above strategy vr is not a Nash equilibrium for 
the A^-player game because at the last time-slot, the best response of player n to 
any strategy is to play D. By induction on the number of time-slots, the only Nash 
equilibrium is when all player play D, which coincide with the MFE. 

Yet, a similar example also exist for finite-time horizon. Let us consider the 
following payoff matrix: 



C 

D 

P 

c 

1,1 

3,0 

3,0 

D 

0,3 

2,2 

0,0 

P 

0,3 

0,0 

3,3 


The setting is similar to the previous example: the action set is equal to the state 
state 5 = ^ = {C, D, P} and at each time step, one player is chosen. If she selects 
an action a £ A, then her state becomes a at the next time step. This game can 
be viewed as a generalization of the prisoner’s dilemma with an additional nash- 
equilibrium P (which stands for “punish”). It can be shown that, when T is large 
enough, the following strategy is a Nash equilibrium: 

• if f < 1, play C if me = 1, play P otherwise 

• if f > 1, play D if mp = 0, play P otherwise. 

Here, the state P is used as a stick to punish people from deviating from the imposed 
strategy. In this case, nobody has an incentive to deviate from this strategy at the 
last step because P is also a Nash equilibrium. 


4. Discrete Time 

As explained in the previous section, mean-field games in continuous time ap¬ 
pears naturally as the limit of Y-player asynchronous games as N goes to infinity. In 
asynchronous games with N players, only a small number of the players change state 
at each time-step. However, in many games, it is often more natural to consider 
synchronous games in which, at each time step, all players take an action. 


4-1- Synchronous N-player game 


We consider a game with N identical players with several differences from the 
model used in Section 3.1 Each player n has an internal state Xn{t) that belongs 
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to a finite state space S (X(t) = {Xn{t),... ,X]\f{t)) and chooses an action from a 
finite action space A. The main difference with the asynchronous model is that at 
each time step t G Z+, all players choose an action An{t) G A simultaneously. We 
assume that, a player in state x who chooses action a goes to state y with probability 
Pxy{a, X{t)) and that given X{t), the evolution of all players are independent. Fur¬ 
thermore, the fact that all players are interchangeable implies that the dependence 
in X(t) can be replaced by a dependence on the population distribution M(t). More 
precisely, for any vector state x, y G and any action vector a G A^, we have: 

N 

¥(X{t + l) = y|X(t) = X, A(t) = a, At) = (11) 

n=l 


where At is the natural filtration of the game up to time t, m is the population 
distribution of x and Pxy{a, m) forms a stochastic matrix, continuous of m. 

We consider an instantaneous cost at time t, that depends on actions and state 
at time t — 1, symmetric in all players, so it can be written as a function of the 
population distribution; cx„(t),A„(t)(i^(^))) ^ discount factor 5 at each time 

step. Given a strategy tt” used by player n and a strategy tt used by all the others, 
the expected cost of player n is: 


V^{Tr^,Tr)=E 




t=o 


An is chosen w.r.t. tt"' 

An> is chosen w.r.t. tt {n' A 

( 12 ) 


4 . 1 . 1 . A particular case: Repeated games 

The classical repeated games with discounted costs and with identical players 
can be defined as follows. Let us first consider a static Wplayer matricial game G 
with symmetric payoff: u(ai,..., otv) is the payoff of any player when the players 
use actions oi,..., oat respectively. Furthermore, we assume that u(ai,..., otv) = 
u(ao-i,..., Oo-jv)) permutation a of {1, • • • , N}. The players repeat the ma¬ 

tricial game infinitely often and their reward under strategy vr^, • • • ,7r^ is the dis¬ 
counted sum of the payoffs: 

00 

F^(7r\ TT^ • • • , TT^) = (1 - (5) ^ 6^u{7r^{t),7r‘^{t), • • • , 7r^{t)). (13) 

t=o 


These games fit in our framework. The state of a player is merely her cur¬ 
rent action (X(t) = A(t)) and the evolution of the state becomes trivial: Under 
state X = b and selecting action a, the next state does not depend on the other 
players and becomes a with probability one: Pba(a, M(t)) = 1. As for the pay¬ 
off u (or cost) of one player at each stage, it corresponds to an immediate cost 
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cx„(t),yi„(t)(M(t)) = —tt(X(i)) since the payoff u only depends on the population 
distribution by symmetry. As for the total reward of a player, (13) coincides with 
(12), as long as all players in the same state use the same strategy. 


4-2. Mean-field limit 

Let us consider a strategy vr such that 7ri^a(m) is the probability for a player 
to choose action a given that she is in state i and that lV[(t) = m. Assume that 
M(0) converges in probability to m(0) as N goes to infinity and that all players 
except player n apply a strategy tt that is continuous in m. As shown in Theorem 
1 in [9] (up to differences in notations, the mean-field model in [9] is the same as 
Equation ([IT|)), the population distribution converges (in probability) to a 

deterministic quantity m^(t) as N goes to inhnity. m’^(t) is defined by 

m]{t+l) = (14) 

i&S aeA 

We denote by the strategy of player 0. The probability that Player 0 is in state 
j € S evolves over time according to the following equation: 

+ 1) = Z] X] ■ vr°a(m(t)). (15) 

i&S aeA 


In this case, the cost of Player 0, given by (12) becomes 


V{A,tt) = (1 -(5)^^^5*Xi(t) •Ci,a(m^(t)) •7r°„(m(t)). 

t=0 iScS a^A 


As the evolution of m is deterministic, for any close loop strategy 7rj^a(m(t)) 
and any initial condition m(0), there exists an open-loop strategy that leads 

to the same values for m^(t) and the same cost. Hence, for the mean-field model, 
one can replace all state-dependent strategy 7r(m(t)) in the above equations by a 
time-dependent strategy vr(t). 

Player 0 chooses the strategy that minimizes her expected cost, which depends 
as well on the strategy tt. When Player 0 does so, we say it does the best-response 
to the mass strategy tt. 

BR{'k) = argminP(7r'^,7r). 

7J-0 

A strategy is said to be a mean-field equilibrium if it is a fixed point for the 
best-response function, that is, 


^MFE ^ 
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One of the difficulties of the analysis of continuous time mean-held game is 
that the elements nnder consideration (the population distribution, the population 
strategy, Player 0 strategy...) are continuous functions of the time. In the discrete 
time case, the model gets signihcantly simplihed since all the elements are vectors. 
Hence, the proof of the existence of a mean-held equilibrium for continuous-time 
mean-held game (Theorem can be adapted to show that the following result. 

Theorem 4 (Mean-Field Equilibrium Existence in Discrete Time and Discounted 
Case). Any discrete time mean-field game with discounted cost that satisfies As¬ 
sumption (Al) for P and c respectively, has a mean-field equilibrium. 

Sketch of proof. We hrst observe that the set of discrete-time open-loop policies is 
a compact and convex set. Thus, to hnish the proof, we need to show that the 
best-response fnnction has a closed graph and it is convex. The former condition 
is trne since the set of open-loop policies belongs to a hnite dimensional space and 
from the continuity assumptions (Al). The last condition can be shown using the 
same arguments as in the proof of Lemma [^6). □ 


4 . 3 . The Folk Theorem does not seale 

The relation between eqnilibria of Al-player games with their mean held limits 
is also complex in that case. 

Let us hrst focus on results that concerns the performance of a mean-held equi¬ 
libria in the A^-player game. The situation is almost similar to the continuous time 
case and resembles Theorem]^ (i) 

Theorem 5. Let n be a mean-field equilibrium. There exists Nq such that for all 
Li' > Nq, tt is a local e-equilibrium of the N-player game. 


Proof. The proof is essentially similar to the proof of Theorem 


□ 


Let us now consider the Nash equilibria of the A^-player game. The situation is 
very different from the continuous time case because the state of all the players can 
change in one time unit in the discrete time while in continuous time, state can only 
change in small steps, one player at a time. 

This has several consequences of the nature of equilibria under both models. As 
mentioned before, the Nash equilibria in the continuous time case may depend on 
the initial population distribntion, but this is not the case here, so that there is 
more latitude for designing equilibria. 


4.1.1 


Let us consider the particular case of repeated games, introduced in Section 
Eor this type of games, the set of equilibria can be characterized using the 


folk theorem. 
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Theorem 6 (Folk theorem, adapted from Th. A in [ 8 ])). Let G be a symmetrie 
matricial game, and let V* be the reward under the strategy that repeats the Nash 
equilibrium of the static game G. Then any feasible reward V larger than V* is 
the reward of an equilibrium of the repeated game if the discount factor j3 is large 
enough. 

Actually, for any V > V*, the construction of an equilibrium whose reward is 
V is based on the “reward and punishment” principle. We claim that none of these 
equilibria scale at the mean field limit. Let us consider the following example for a 
static game. Each player only has two strategies, D and G. If all players play D, 
the payoff is 1. If all players play C, the payoff is 2. If some players play D and 
others play G, then, all the players who play G get 0 while the players who play D 
get 3. 

The unique Nash equilibrium of the static game is strategy {D, D,... ,D). The 
reward of the corresponding repeated game is (1 — 6) 

Let us now consider the following strategy (called vr* in the following) for all 
players: Play D for k rounds then play G as long as every-other player has followed 
the same pattern, else play D for ever. The reward of this strategy is between 1 and 
2 : 

k—1 CO 

(l-<5)(^(5* + ^25*) = 1 + 5 ^. 

t=0 t=k 


The strategy vr* is an equilibrium if 5 is large enough. Indeed, no player wants 
to deviate in the first k rounds, because her gain would decrease from 1 to 0 and 
then stay at 1. In the rounds after k, a deviation provides an immediate payoff 
advantage of 3 instead of 1, at the cost of being punished until the end of time, so 
that as a larger enough 5 makes this non-prohtable. 

Let us now consider the mean-field limit of this game when the whole population 
uses the strategy vr*. 

If one player uses the same strategy her reward becomes 

OO 

E(7r*, TT*) = (1 - 5) ^ ^ ^ 6^Xi{t) ■ Ci^a{m^{t) ■ 7r°,,(m(t)), 
t=0 a^A 

k—1 OO 

t=0 t=k 

= 1+5^. 

However her best response to ir* is not vr* but the strategy where she plays D 
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all the time. Indeed her total reward becomes 

k—1 oo 

y(7r^,7r*) = (l-5)(j;h* +^^35^) 

t=0 t=k 

= 1 + 26 ^. 

Therefore, tt* is not a mean-field equilibrium. 


4.4- Finite Horizon Case 


We now focus on the mean-field games when objects evolve in discrete time time 
over a finite horizon, 0 to T. For this case, the population distribution is defined 


by (14), which depends on the strategy of the mass vr. We assume that Player 0 can 
choose her own strategy vr®. The expected cost of Player 0 is 


P(7r° 


vr = 


t =0 ies a£A 


where Xi{t) is the probability that Player 0 is in state i at time t. The evolution of 
Xi{t) over time is described in (15). 


Player 0 does best-response to a given population strategy vr, which means that 
she selects the strategy that minimizes her expected cost. We are interested in 
proving the existence of a mean-field equilibrium which consists of finding a strategy 
that is a fixed-point for the best-response function. In Section |4.2t we showed this 
for the discounted case. In the finite horizon case, the vectors have finite size and, as 
a consequence, it is immediate to show, using the same arguments of those required 
for the proof of Theorem the desired result. 


Theorem 7 (Mean-Field Equilibrium Existence in Discrete Time and Finite Hori¬ 
zon Case). Any discrete time mean-field game with finite horizon cost such that P 
and c satisfy Assumption (Hi) has a mean-field equilibrium. 


Again, the proof mimics the proof of the analog Theorem in continuous time 
over a finite horizon. 


5. Illustrative Example: Epidemic Model with Vaccinations 

In this section, we study an epidemic model with vaccinations. As we will see, 
this model satisfies the conditions of Theorem and, as a consequence, it has a 
mean-field equilibrium. For this particular case, we show the existence of a pure 
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Nash equilibrium and compare the performance of this equilibrium with a centralized 
allocation. 

We consider a population of homogeneous players that evolve in continuous time 
from time 0 to T. The state-space of a player is {S', I, R,V}, which respectively stand 
for susceptible, infected, recovered and vaccinated. We denote by ms{t), 
mR{t) and mv{t) the proportion of the population that is, respectively, susceptible, 
infected, recovered and vaccinated at time t. 

The dynamics of one player is a Markov process that can be described as follows. 
A player encounters other players with rate 7 . If the initial player is susceptible and 
the encounter is infected, the first player becomes infected. An infected player 
recovers at rate p. We also consider that susceptible population can get vaccinated 
with strategy vr, where 7 r(t) G [0,t]. We consider that r < 00 . Once an player is 
vaccinated or recovered, her state does not change. The Markovian behavior of a 
player is displayed in Figure [Tj 



Figure 1: The dynamics of a player in the epidemic model. A player has four possible states: S 
(susceptible), I (infected), R (recovered) and V (vaccinated). 

We are interested in the analysis of this epidemic model at the mean field limit. 
When the number of players goes to infinity, the dynamics of the population is 
given by the following system of differential equations that is typical of population 
dynamics studied in [18] for the non-controlled case. This system corresponds to 
Equation ([^ using the form of the rate matrix Q given in Example When all 
players apply the same strategy vr, this system of differential equations is 

' rhj(t) = —^mg{t)m'](t) — TT{t)m'g{t) 

= prnjit) 

. m^(t) = 7r(t)m5(t) 

In |20| the authors develop an approximation of this epidemic model, based 
on independence assumptions, and characterize the solution of the corresponding 
mean-field game. In the following subsection, we show that the mean-field game 
corresponding to this model is tractable and can be analyzed rigorously without 
making approximations. 
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5.1. Mean-Field Game 


We focus on a particular player, that we call Player 0. Player 0 chooses her 
vaccination strategy where Tr^{t) G [0,r]. Let xj (t) be the probability that 
Player 0 is at time t in state i, where i G {5, 1, R,V}. The quantity (t) satisfies 
the following system of differential equations; 

= -ixf{t)m]{t) - 7:{t)xf{t) 
xf’'^{t) = 'jxf - pxf{t) 

=pxf{t) 


Using the foregoing notations, the expected individual cost of Player 0 is defined 
as follows: 

U(7r°,7r)= [ {cvTr^{t)xf {t) + cixf{t))dt, 

Jo 


where cy is the vaccination cost and c/ is the unit time cost of being infected. 


The rate at which susceptible population becomes infected is linear in the pro¬ 
portion of infected population. Further, the rest of the rates and instantaneous 
costs do not depend on the population distribution. As a result, the conditions 
of Assumption (Al) are satisfied in this model, which implies the existence of a 
mean-field equilibrium. In the following, we show that it can be also characterized. 


We call the best response to vr and denote by BR(7r) the set of vaccination strate¬ 
gies that minimize the cost of Player 0 for a given the population strategy vr. We 
model this minimization problem as a Continuous Time Markov Decision Process 
with finite horizon T [25]. We uniformize the Continuous Time Markov Decision 
Process and let p > maxjy -|- r, p} be the uniformization constant. We denote by 
Js{t) (resp. Ji{t)) the optimal cost in the susceptible state (resp. the infected state) 
at time t and thus, we write the optimality equations of the associated discrete time 
Markov Decision Process as follows: for all t = 0,..., T — f, 


Js{t) 

Jl{t) 


inf 

■7r0(t)e[0,T] 


G{t) -I- 




^Cy - Js{t -\- 



^ + Jl{t + -) (l 

p p \ 



where Js{T) = Ji{T) = 0 and 


G{t) = 



Jsit+-) + ^Ji{t+-). 
p p p 


(16) 

(17) 


(18) 
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We know that BR(7r) coincides with the solution of this optimality equations. 
We aim to show that it is of threshold type. Prior to that, we present the following 
results. 

Lemma 2. The functions Js and Ji satisfy that: 

(a) Ji{t) is decreasing with t. 

(h) Ji{t) > Js{t) for all t. 

(c) Js{t) is decreasing with t when BR(7r)(t) = 0. 


Proof. (a) We prove that Ji{t) is decreasing with t by an induction on t. First, 
we observe that J/(T — ^) = cj-//x > Ji{T) = 0. Assume now that Ji{t) is 
decreasing with t fox all t = to + . ,T. Using the induction assumption and 

that J/(to + and ^1 — ^ are positive, we obtain that Ji{to) > J/(to + 
and the proof finishes. 


(b) We show this result by induction on t. We first observe that Js{T — ^) < 
Ji{T—^). Assume now that this holds fox all t = to + ... ,T. The proof ends 

if we show that Jsito) < First, from (16), it follows that Js(to) < G{t), 

where G{t) is as defined in (18). From the induction assumption that states 
that Js{to + ji) < Jiito + ji): we have that G{t) is less or equal than Jj{t(} + jj^). 
Finally, from Lemma we know that Ji{t) is decreasing with t and the 
desired result follows. 


(c) First, we observe that if BR(7r)(t) = 0, then Js{t) = G{t), where G{t) is as 
defined in (18). Besides, using Lemma we obtain that G{t) is greater 
or equal than Js{t + ^). Therefore, we have shown that, if BR(7r)(t) = 0, 
Js{t) > Js{t + ;^)- 


□ 


We say that a strategy is a threshold strategy (or a strategy with threshold 
to)- if there exists to £ [Oj^] such that 


7r°(t) 


T if t < to 

0 if t > to 


We are now ready for the following proposition. 

Proposition 1. For any population strategy tt, there exists a best-response that 
is a threshold strategy. 
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Proof. Let be a best-response strategy. We first notice that from (16), that 
7r°(t) = 0 if Js{t) < cv and 7r°(t) = r if Js{t) > cy- We now distinguish two cases. 


The first case is if JsiO) < cy which implies 7r°(0) = 0. According to Lemmaj^fcj, 
Jsit) is never higher than cy, which implies that 
pi^{t) = 0 for all t. 


We now study the case Js(0) > cy. Since Js{T) = 0, Js{t) must be less than cy 
at some time. Let to b® the value such that cy > Js(to + ^) for the first time. Then 
7r^(t) = r for all t < to. Moreover, we have that 7r^(to) = 0 and, from Lemma [^c), 
it follows 7r*^(t) = 0 for all t > to. □ 


This Proposition implies that, for a strategy tt with threshold t, there exists 
a best-response that is a threshold strategy. Let us denote by tBR{t) fbe threshold 
of such a best-response strategy to the strategy with threshold t. We show the 
following relation that the threshold of the population and of Player 0 satisfy. 

Lemma 3. The threshold decreases when t increases. 


Proof. We first observe that if t increases, then the number of vaccinated population 
increases, which implies that the number of infected population mf{t) decreases. 
From Lemma^b), we know that Ji{t) > Js{t) for all t. Thus, in ( [l6| ) mj{t) is 
multiplied by Ji{t + j^) — Js{t + ji), which is positive. Therefore, if the number of 
infected population mj{t) decreases then Js{t) also decreases. Finally, from (16), 
we have that if Jsit) decreases, then also decreases. □ 


A mean-field equilibrium for this epidemic model is a vaccination strategy 
such that when the population chooses the vaccination strategy a selfish 

Player 0 would also choose the same vaccination strategy . Hence, a strategy 

with threshold t is a mean-field equilibrium if = t. From Lemma we 

have that the thresholds of both strategies meet in a unique point, which gives the 
following proposition. 

Proposition 2. There exists a pure mean-field equilibrium that is a threshold strat¬ 
egy- 


We recall that, in Theoremwe show the existence of a mean-field equilibrium. 
In addition, for this epidemic model, we prove not only the existence but also that 
this equilibrium is pure and is a threshold strategy. This simplifies the numerical 
computation of a mean-field equilibrium which can be done by solving a fixed point 
equation for the threshold. 
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5.2. Centralized Control Strategy 

We focus on a centralized control problem where the goal is to minimize the 
expected cost of the population. The authors in m study centralized strategies 
in this SIR model and show that its solution is the unique viscosity solution of an 
Hamilton-Jacobi-Bellman equation. We show that the centralized control problem 
is tractable and characterize its solution. 

We denote by C(7r) the cost incurred in the system by the population vaccination 
strategy vr, i.e., 

^(Tr) = / {cimi{t) + cv'n'{t)ms{t)) dt. 

Jo 

The global optimum of the problem is the population strategy that minimizes 
the total cost and let 

G argminC(7r). 

TT 

As for the case of mean-held equilibrium, a global optimum is a threshold strategy. 
Proposition 3. The strategy that minimizes the total cost is a threshold strategy. 


Proof. For a given the population strategy vr and e > 0, we dehne uo{t) 
and 


Ul{t) 


uo{t) if t < to, 
uo{t)-e if t G [to,to + 5), 

'i^oi.lJ) T e if t G [to T ^0 T 25), 

, uo{t) if t G [to + 25, r], 


7r(t)m5(t) 


satisfying that uo(v)dv = ui(v)dv. We show that uo(t) is an improving strat¬ 
egy comparing with ui(t), which means that the incurred cost by uo(t) is less that 
the system cost under ui(t), that is: if mg(t) and m^{t) are the proportion of sus¬ 
ceptible and infected population under strategy uo{t) and rng{t) and m\{t) be the 
proportion of susceptible and infected population under strategy ui{t), then 


{cim^{t) -|- cvUo{t)) dt< {cim\{t) -|- cytti(t)) dt. 


(19) 


Since uo{v)dv = ui{v)dv, this inequality holds if m!j{t) < m\{t), for all 
t. Let m^{to) = m]{to) = io and m^{to) = m^(to) = so. We divide the proof in 
three parts: (A) m]{to -|- 5) > m}j{to + 6 ), (B) m]{to + 26) > m^j{to + 26) and (C) 
m]{t) > mfj{t), for t = to + 26. 
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(A) From the definition of ui{t), it follows that m^(tQ) = mj(to) and m^(to) = 

m^(to) + e- Besides, for the second derivative, we have that m]{to) = + 

76^0 and mg{to) = m^(to) — 

As a result, using the Taylor expansion, we obtain that 
"1/(^0 + 5) = m%to + <5) + y 7 e fo + 

and m; 5 (to + (^) = + (5)+(fe—^ 7^*0 + 0(5^)- 

(B) The hrst derivatives of the proportion of infected and susceptible population 
at to + <^ satisfy that m]{tQ + 5) = m^j{to + 5) + 6 7 e fo + 0((5^) and mg{to + 5) = 
mPg{to + 6) — e — 6'yeio + 0{5‘^). For the second derivatives, we have that 
mg{to+6) = m^g{to + S)+^eio + 0{6) and m]{to + 6) = m^j{to + 6)-jeio + 0{6). 
From these expressions, it results that 

"^ 5(^0 + 2(5) = rn^g{to + 26) — S'^ejio + 0(6^), ( 20 ) 

and 

+ 2(5) = WbiitQ + 2(5) + 5^67^0 + 0(6^). (21) 

(C) We show this result by induction on t. First, we note that ( |20| ) and © are 
satisfied. Then, we assume that at t' > to+26, m\{t') = m?^{t')+^eio6‘^+0{6^) 
and mg{t') = mPg{t') + 0{6‘^) are satisfied. To finish the proof, we need to show 
that these equations hold for t' + 6. 

The first derivative of the infected population at t' satisfies that m)(F) = 
mPj{t') + 0((5^) and the second derivative = mPj{t') + 0((5^). For the 

susceptible population, we obtain that rng{t') = + 0((5^) and m^(F) = 

mPg{t') + 0 (( 5 ^). 

Finally, using the previous expressions, the Taylor expansion gives 
m]{t' + (5) = wg{t' + (5) + 7ezo(5^ + 0{6^), 
and mg{t’ + (5) = mPg{t' + (5) + 0{6^). This finishes the proof. 

From this result, we have that if we displace an infinitesimal part of area of 
b{t)S{t) to the left the system cost decreases. Therefore, proceeding recursively, we 
conclude that the population strategy of minimal cost is a threshold strategy. □ 
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5.3. Numerical comparisons 

We know that there exists a mean-held equilibrium and a global optimum that 
are of threshold type. Hence, we now compare the equilibrium obtained in Propo¬ 
sition with the global optimum of Proposition For this purpose, we consider 
the following system parameters: p = 36.5, 7 = 73, r = 10, T = 0.3, c/ = 36.5 and 
cv = 0.5 

Remark 2. The authors in m consider the previous system parameters except 
for the infection cost that is 1, instead of 36.5. It is easy to see that if the cost of 
infection is c/ times p both models coincide. The authors, using their approximation, 
obtain that the cost for the mean-field equilibrium and in the global optimum are, 
respectively, 0.55 and 0.53. Using our approach, the resulted cost is 0.542 for the 
mean-field equilibrium and 0.524 for the global optimum. 




(b) Population dynamics (zoomed). 


Figure 2: Population dynamics under the equilibrium strategy (dashed line) and the global optimum 
strategy (solid line). Three zones are displayed: (i) in the white region, the global optimum and 
the equilibrium vaccinate with maximum rate; (ii) in the dark gray region, the global optimum 
vaccinates with maximum rate, while the equilibrium does not vaccinate; and (in) in the light gray 
region, neither the global optimum nor the equilibrium vaccinates. 7(0) = 5(0) = 0.4. 


We computed the optimal strategies and of the mean-field equilibrium. The 
results are reported in Figure!^ in a simplex-plot. In this figure, the simplex set is 
divided in three regions that represent the decision taken by both policies at time 
0, as a function of the initial state. In the white region, both strategies vaccinate 
at maximum rate. In the dark gray region, the strategy of the global optimum is to 
vaccinate at maximum rate and the strategy of the equilibrium is not to vaccinate. 
In the light gray region, the strategy of the equilibrium and the global optimum is 
not to vaccinate. 
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We also plot the trajectories corresponding to both strategies when the initial 
proportion of infected population and of susceptible population are both equal to 0.4 
at time 0. In Figure]^ (see Figurefor a zoomed figure), we plot with solid line the 
behavior of the equilibrium vaccination strategy and with a dashed line the behavior 
of the global optimum. We observe when at the beginning, both strategies consist 
in vaccinating at maximum rate. After some time, the equilibrium strategy is not to 
vaccinate, whereas the global optimum strategy does not change the strategy. After 
that, there is another instant where the global optimum strategy do not vaccinate 
and, therefore, no strategy vaccinates. 

In this simulation we observe that the threshold at which the equilibrium changes 
her strategy is smaller than for the global optimum. We have compared these 
thresholds over a large number of simulations with different parameters and the 
obtained results say that in all but degenerated cases, the thresholds do not coincide, 
so that the price of anarchy of this model is never equal to 1. 

5.4- Mechanism Design 

In Figurej^ we compare the thresholds of the optimal strategy with the one of the 
mean-field equilibrium strategy. For a given cy and fixed the rest of the parameters, 
we denote by t°P^{cv) (resp. t^^{cv)) the threshold of the global optimum strategy 
(resp. equilibrium strategy). It can be shown that in both cases, the thresholds are 
decreasing in cy'- the more costly is the vaccination, the less people vaccinate (for 
the globally optimal situation or for the mean-field equilibrium). 



Figure 3: Comparison of the threshold of the equilibrium and of the global optimum when cv varies 
from 0.01 to 1.21. 

Figure confirms that the threshold decrease with cy and also shows that the 
thresholds are never equal for this range of parameters. This figure also suggests that 
the optimal threshold is always larger than the mean-field equilibrium threshold. 
This fact was already observed in [20] and suggests that, if the vaccination decisions 
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are let to individual, then vaccination should be subsidized, by removing a cost p to 
the vaccination cost in the equilibrium so that both thresholds coincide, i.e., 

e%cv-p)=f^\cv). 

For example, with the same parameters as in the simulation of Figure we 
observe that, for cy = 0.8, the threshold of the global optimum is 0.034, while the 
threshold of the equilibrium is 0. As it can be seen, the threshold of the equilibrium 
is 0.034 when cy = 0.45. This simulation shows that, to encourage selfish individuals 
to vaccinate optimally, vaccination should be subsidized in order to reduce its cost 
ofp = 0.35. 


6. Related Work 

Sometimes population games as defined in [2^ are also called mean field games 
because they are mean-field limits of static games. Here, we only discuss the mean 
limit of dynamic games, as originally introduced by Lasry and Lions [23] or by 
Huang, Malhame and Caines m- 

Since the seminal work in millllllSl ETj, a large variety of papers have been 
investigating mean-field games. Most of the literature perform an analysis of these 
games based on a coupling of a Hamilton-Jacobi-Bellman equation as well as on the 
Fokker-Planck equation (see for example na EiEi la Ej). Here, we are interested 
in studying mean-held games with a hnite number of states and hnite number of 
actions. In this case, the analog of the Hamilton-Jacobi-Bellman equation is the 
Bellman equation Q and the discrete version of the Fokker-Planck equation is the 
Kolmogorov equation (El- 

In this article, we analyze continuous as well as discrete time mean-held games. 

Finite state space mean-held games in discrete time has received less attention. 
It was previously studied in [Tn| where the strategy of the players is the rate at 
which they change between states. In our terms, this corresponds to the case where 
the mixed action space is the set of all bounded rate matrices and Qij^a = a- This 
gives each player the power to decide her dynamics independently of the state of 
the others. 

Continuous time hnite state space mean-held games have been previously studied 
in miiiii. In this models, the players also control completely the transition rate 
matrix. Our model is more general. It considers that the players may not have 
such a power and their actions only have a limited effect on their state. Here, the 
transition rates Q may depend not only on the action taken by the player, but 
also on the population distribution of the system. We claim that this scenario is 
rather common in systems such as epidemic or belief propagation and the diffusion 
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of information, as detailed in Section but also in other cases such as resource 
allocation, where a player cannot use a resource already utilized by others. 

Since our mean-field game model is a strict generalization of these previous 
models, the existence of a mean-field equilibrium can be seen as corollaries of our 
main theorem. Furthermore, another important difference of our work with respect 
to these continuous time finite state space models concerns the cost functions. In 
fact, in m it is assumed that the cost of a player is strictly convex on her strategy 
and in m the authors consider uniformly convex functions. We note that these 
models do not cover, for example, linear costs, which are important for applications. 
In our approach, the only requirement on the cost is continuity with respect to the 
population distribution (see Assumption (Al)). 

Different authors have studied the convergence of A^-player games equilibria 
to mean-field equilibria, e.g. [HI HJ |27l [28]. Their model is different from ours 
since they consider that the strategy of a player only depends on her internal state 
(called stationary policies in [28] )• Here we allow time dependence to these policies. 
The model in |5H] does include state dynamics that depending on the population 
distribution but only considers stationary strategies that do not depend on time, 
hence cannot depend on the population dynamics. 

Finally, our four models of dynamic games with a finite number of players do 
not face the issue of the order of play. Thus, we avoid two difficulties of dynamics 
games: the information structure of each player and the existence of a value [Tj. 
In our case, all players are similar, so the order of play is irrelevant, and we only 
consider the full information case (players know the strategy of the other players 
and their current state). Then, the continuity assumptions of the cost function and 
of the rate matrices are enough to ensure the existence of an equilibria. 


7. Conclusions 

In this article, we introduce mean field games with explicit interactions. They hit 
a good compromise between tractability (existence of an equilibria) and modeliza- 
tion power (including propagation and congestion behaviors). This model consists 
of a finite state space mean field game where the transition rates of the objects and 
the cost function of a generic object depend not only on the actions taken but also 
on the population distribution. We also show that there exists a sub-class of Nash 
equilibria for N-player games that converge to mean-field equilibria when the num¬ 
ber of players grows. Outside of this class, and in particular for all equilibria using 
the “tit for tat” principle, over which the Folk theorem is based, the convergence 
does not hold. 

For future work, we are interested in finding conditions ensuring the unique¬ 
ness of the mean-field equilibrium. We believe that monotony assumptions similar 
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to assumptions in m are required to prove the existence of a unique mean-field 
equilibrium in this model. On the other hand, another interesting open question 
concerns the convergence of A^-players equilibria to mean-field equilibria when the 
number of player grows large. We aim to characterize the largest sub-class of strate¬ 
gies where convergence holds. Obviously, this class includes all local strategies and 
excludes some Markovian ones. 
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